{
    "cells": [
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "bcddce7b",
            "metadata": {
                "scrolled": true
            },
            "outputs": [],
            "source": [
                "# download presidio\n",
                "!pip install presidio_analyzer presidio_anonymizer\n",
                "!python -m spacy download en_core_web_lg"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "3345f1c4",
            "metadata": {},
            "source": [
                "####### Path to notebook: [https://www.github.com/microsoft/presidio/blob/main/docs/samples/python/integrating_with_external_services.ipynb](https://www.github.com/microsoft/presidio/blob/main/docs/samples/python/integrating_with_external_services.ipynb)"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "animated-title",
            "metadata": {},
            "source": [
                "# Integrating external models/services with Presidio\n",
                "\n",
                "Presidio analyzer is comprised of a set of PII recognizers which can run local or remotely. \n",
                "In this notebook we'll give an example of integrating an external service into Presidio-Analyzer.\n",
                "\n",
                "## Azure Text Analytics\n",
                "\n",
                "Azure Text Analytics is a cloud-based service that provides advanced natural\n",
                "language processing over raw text. One of its main functions includes \n",
                "Named Entity Recognition (NER), which has the ability to identify different\n",
                "entities in text and categorize them into pre-defined classes or types.\n",
                "\n",
                "### Supported entity categories in the Text Analytics API\n",
                "Text Analytics supports multiple PII entity categories. The Text Analytics service\n",
                "runs a predictive model to identify and categorize named entities from an input\n",
                "document. The service's latest version includes the ability to detect personal (PII)\n",
                "and health (PHI) information. A list of all supported entities can be found in the\n",
                "[official documentation](https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/named-entity-types?tabs=personal).\n",
                "\n",
                "### Prerequisites\n",
                "To use Text Analytics with Preisido, an Azure Text Analytics resource should\n",
                "first be created under an Azure subscription. Follow the [official documentation](https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/how-tos/text-analytics-how-to-call-api?tabs=synchronous#create-a-text-analytics-resource)\n",
                "for instructions. The key and endpoint, generated once the resource is created, should replace the placeholders `<YOUR_TEXT_ANALYTICS_KEY>` and `<YOUR_TEXT_ANALYTICS_ENDPOINT>` in this notebook, respectively. \n",
                "## Text Analytics Recognizer\n",
                "In this example we will use the [`TextAnalyticsRecognizer`](https://github.com/microsoft/presidio/blob/main/docs/samples/python/text_analytics/example_text_analytics_recognizer.py) sample implementation. This class extends Presidio's [Remote Recognizer](https://microsoft.github.io/presidio/analyzer/adding_recognizers/#creating-a-remote-recognizer) for calling the Text Analytics service REST API. For additional information of a remote recognizer, see the [ExampleRemoteRecognizer](https://github.com/microsoft/presidio/blob/main/docs/samples/python/example_remote_recognizer.py) sample."
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "mature-break",
            "metadata": {},
            "outputs": [],
            "source": [
                "from presidio_analyzer import AnalyzerEngine\n",
                "from text_analytics.example_text_analytics_recognizer import TextAnalyticsEntityCategory, TextAnalyticsRecognizer"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "thick-separate",
            "metadata": {},
            "source": [
                "1. Define which entities to get from Text Analytics"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "super-jaguar",
            "metadata": {},
            "outputs": [],
            "source": [
                "ta_entities = [\n",
                "    TextAnalyticsEntityCategory(name=\"Person\",\n",
                "                                entity_type=\"NAME\",\n",
                "                                supported_languages=[\"en\"]),\n",
                "    TextAnalyticsEntityCategory(name=\"Age\",\n",
                "                                entity_type=\"AGE\",\n",
                "                                subcategory = \"Age\", \n",
                "                                supported_languages=[\"en\"]),\n",
                "    TextAnalyticsEntityCategory(name=\"InternationlBankingAccountNumber\",\n",
                "                                entity_type=\"IBAN\",\n",
                "                                supported_languages=[\"en\"])]"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "thermal-liverpool",
            "metadata": {},
            "source": [
                "For a full list of entities: https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/named-entity-types?tabs=personal"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "competent-probe",
            "metadata": {},
            "source": [
                "2. Instantiate the remote recognizer object (In this case `TextAnalyticsRecognizer`)"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "simple-fundamentals",
            "metadata": {},
            "outputs": [],
            "source": [
                "text_analytics_recognizer = TextAnalyticsRecognizer(\n",
                "        text_analytics_key=\"<YOUR_TEXT_ANALYTICS_KEY>\",\n",
                "        text_analytics_endpoint=\"<YOUR_TEXT_ANALYTICS_ENDPOINT>\",\n",
                "        text_analytics_categories = ta_entities)"
            ]
        },
        {
            "cell_type": "markdown",
            "id": "novel-mission",
            "metadata": {},
            "source": [
                "3. Add the new recognizer to the list of recognizers and run the `PresidioAnalyzer`"
            ]
        },
        {
            "cell_type": "code",
            "execution_count": null,
            "id": "permanent-samuel",
            "metadata": {},
            "outputs": [],
            "source": [
                "analyzer = AnalyzerEngine()\n",
                "analyzer.registry.add_recognizer(text_analytics_recognizer)\n",
                "\n",
                "results = analyzer.analyze(\n",
                "    text=\"David is 30 years old. His IBAN: IL150120690000003111111\", language=\"en\"\n",
                ")\n",
                "print(results)"
            ]
        }
    ],
    "metadata": {
        "kernelspec": {
            "display_name": "presidio",
            "language": "python",
            "name": "presidio"
        },
        "language_info": {
            "codemirror_mode": {
                "name": "ipython",
                "version": 3
            },
            "file_extension": ".py",
            "mimetype": "text/x-python",
            "name": "python",
            "nbconvert_exporter": "python",
            "pygments_lexer": "ipython3",
            "version": "3.7.9"
        }
    },
    "nbformat": 4,
    "nbformat_minor": 5
}
